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Abstract — In opportunistic networks, the use of social metrics 
(e.g., degree, closeness and betweenness centrality) of human 
mobility network, has recently been shown to be an effective 
solution to improve the performance of opportunistic forwarding 
algorithms. Most of the current social-based forwarding schemes 
exploit some globally defined node centrality, resulting in a bias 
towards the most popular nodes. However, these nodes may not be 
appropriate relay candidates for some target nodes, because they 
may have low importance relative to these subsets of target nodes. 
In this paper, to improve the opportunistic forwarding efficiency, 
we exploit the relative importance (called partial centrality) of 
a node with respect to a group of nodes. We design a new 
opportunistic forwarding scheme, opportunistic forwarding with 
partial centrality (OFPC), and theoretically quantify the influence 
of the partial centrality on the data forwarding performance 
using graph spectrum. By applying our scheme on three real 
opportunistic networking scenarios, our extensive evaluations 
show that our scheme achieves significantly better mean delivery 
delay and cost compared to the state-of-the-art works, while 
achieving delivery ratios sufficiently close to those by Epidemic 
under different TTL requirements. 

I. Introduction 

The paradigm of opportunistic forwarding has been pro- 
posed to serve emerging wireless networking applications, 
where nodes experience intermittent connectivity [T) 0. In 
such scenarios, to transmit messages to a distant destination 
under a given delay bound, node mobility is exploited to 
let nodes broker information exchange between disjoined 
parts of the network J3). Therefore, the main challenge for 
opportunistic forwarding is to make an effective forwarding 
decision, such that the chosen relays have the best cumulative 
probability to the destination within the delay bound. 

In most of the early works, due to the unstable end-to-end 
path and lack of global knowledge of the network topology, 
data forwarding decisions are generally made by adopting 
various heuristics, such as inferring the likelihood of forward- 
ing the message (e.g., |4| j6]), employing the contact 
locations (e.g., Q), or focusing on the contact frequencies [8|. 
Obviously, these solutions guarantee packet delivery based on 
the prediction of physical contact metrical of nodes. We argue 
that such solutions are not cost effective for opportunistic 
scenarios, since these simple metrics only reflect one facet of 
the underlying mobility process. On the other hand, with the 
recent popularization of personal hand-held mobile devices, 

'in this paper, we use the term "physical contact metrics" to denote 
contact pattern of a node with others such as contact time/frequency/location, 
and the term "social contact metrics" to denote those of similarity, centrality 
or community etc. 



human walks heavily affect the network performance [9| 1 10 1, 
e.g., devices may lose connection when people move around 
(in the rest of this paper, without loss of generality, we use the 
terms "people" and "node" interchangeably). We believe that 
the social contact metrics achieved by the complex network 
analysis |fTT1 capture the inherent characteristics of the net- 
work structure, and are less volatile than the physical contact 
metrics. Motivated by the above observation, in this paper, 
we focus on integrating social metrics into the opportunistic 
forwarding algorithms. This design turns out to be critical 
while challenging, especially in an intermittently connected 
environment. 

Recently, there are a few attempts to explicitly make use 
of the social metrics to formulate the opportunistic forward- 
ing decision. Among them, SimBet |12|, Bubble [13] and 
PeopleRank lfl4l are the three most recent works. Although 
the detailed forwarding schemes may differ, all of them are 
motivated by the following two important observations from 
society: 1) people with closer relationship tend to reside in 
communities and 2) people within a community may have 
different popularity. As such, the increasingly "popular" or 
"central" nodes are more probably chosen as carriers to relay 
messages between disconnected communities IT41 lfT31 . until 
a node belonging to the same community with the destination 
is reached lfl2l |fT3l . Intuitively, information about community 
structure and node popularity enables them to outperform 
well-known opportunistic forwarding algorithms that are not 
explicitly "social based". 

Nevertheless, we notice that all of the three protocols prefer 
to use global measures of node centrality (e.g., exploiting ego 
networks in 1 12 1, betweenness centrality in [ 1 3 1 and PageRank 
[16 1 algorithm in [14Q, in that each node is ranked with 
respect to all other nodes in the network. We argue that those 
popular nodes possibly with high global node centrality may 
not be the appropriate relay candidates, due to the fact that 
such nodes may have low importance relative to a specific 
subset of nodes, where the destination belongs. Interestingly, 
those nodes with low global node centrality but exhibiting high 
relative importance to the community partners of destination 
bear most weight on routing performance. In other words, 
this relative importance provides fine-grained relations among 
nodes. Thus, it helps to make informed forwarding decisions 
(e.g., a node is just the desired relay, if it exhibits a highly 
relative importance to the destination's community partners). 

To this end, we first employ the partial centrality metric to 
measure the relative importance of a node with respect to such 



nodes within a community. We then develop an opportunistic 
forwarding scheme by jointly considering the partial centrality 
metric and the community structure, to improve the oppor- 
tunistic forwarding efficiency. We summarize our contributions 
as follows: 

> We evaluate the performance of opportunistic routing 
based on the partial centrality metric. To the best of our 
knowledge, this is the first attempt to integrate this social 
metric into opportunistic routing. 

> We propose an online method to compute node's partial 
centrality in a distributed fashion, which makes our work 
more applicable. We also detect the overlapped commu- 
nity structure by effectively distinguishing the bridging 
nodes from other nodes, and exploit the community 
structure to label the community partners of destination. 

• We formulate the strength of relationship between nodes 
as a Decayed Sum Problem |fl~8), and use a Decayed 
Aggregation Graph (DAG) to model the dynamic of 
network topology. 

> We implement OFPC and compare it to several state-of- 
the-art works through three real opportunistic networking 
scenarios. Our extensive evaluation results show that, 
overall, the OFPC outperforms other solutions, especially 
in terms of mean delivery delay and cost. For example, 
it achieves up to a 70% improvement in mean delivery 
delay over Prophet [4| and 40% over Bubble [13], and has 
a reduction of cost by up to 2 and 3 factors compared to 
Bubble and Prophet respectively in one network example. 

We organize the remainder of this paper as follows. Section 
II overviews the problem and network model. Section III 
describes the forwarding scheme. In Section IV, we make 
a performance evaluation. After briefly reviewing the related 
work in Section V, we draw our conclusions in Section VI. 

II. PRELIMINARIES 

A. Centrality and Partial Centrality Metrics 

Node centrality reflects the importance of a node relative to 
all other nodes in the network (i.e., how popular a person 
is within a social network), while node partial centrality 
measures the relative importance of a node with respect to 
such nodes within a group. Freeman [19| proposed three most 
widely used methods to estimate centrality, called degree, 
closeness and betweenness measures. We here take the degree 
measure as an example to illustrate the difference between 
centrality metric and partial centrality metric. Degree centrality 
is measured as the number of one-hop neighbors of a given 
node u, which reflects the direct relationship between the node 
u and its neighbors. In general, a node can directly contact 
with more other nodes, if it has a higher degree centrality. 
Node u's degree centrality is counted as: 

71 

where n is the number of nodes in the network and S uv — 1 
if node v is one of neighbors of node u, otherwise, 6 UV — 0. 
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Fig. 1. Centrality and partial centrality metrics in two different situations: 
(a) D% = 5 and D v c = 4; (b) L>£ = D£ = 5. And in both situations, 
D^pQ 2 * 1 = 2 and Dp^? 2 ' = 3 (suppose the destination belongs to the 
community Ci)- 

Similar to Eq. (1), a node's partial centrality is 

HCfell 

Y, 8 UV (2) 

v — 1 , v^u 

where || Ck || is the number of nodes in community Ck and 
S U v = 1 if and only if 1) node v belongs to community Ck 
and 2) node v is one of neighbors of node u. 

The main difference between the centrality metric and that 
of the partial centrality is illustrated in FigQ] where node u 
needs to make a forwarding decision when it encounters node 
v. Node u will not forward the messages destined to commu- 
nity C2 to node v according to the traditional centrality-based 
forwarding schemes [12| [ 1 3 1 [14|, this is mainly because that 
the centrality of node u is bigger than or equal to that of node v 
in the two situations, respectively. Whereas, if the forwarding 
schemes were guided by the partial centrality metric, node v 
would be a better relay since its partial centrality is bigger 
than that of node it. 

B. Network Model 

In this paper, we model an opportunistic network as a 
Decayed Aggregation Graph (DAG) G = (V,E), where V 
denotes the set of nodes and E denotes the set of edges. 
Let W(t) = (w uv (t)) nxn denote its adjacency matrix and 
N uv (t)={(on,i,offi), i=l,2,...,N} denote the contact series 
between nodes u and v at moment t, where the tuple 
(orii, offi) denotes the start moment and end moment of the 
ith contact respectively, and N is the number of contacts. 
We formulate the computing strength of relationship between 
nodes (i.e., the value of w uv (t)) as a Decayed Sum Problem. 

Definition 1 (Decayed Sum): Given the contact series 
N uv (t), the goal is to estimate the decayed sum at any current 
time T 

N 

w vv (T) = Yfd)9{T-off i ) (3) 

where f(i) = offi — orii denotes the ith contact duration 
and g(T — offi) denotes the decayed function. In this paper, 
we set g(T — of fi) = e^^ x< ^ T ~' ^ i ' > \ since the inter-contact 
time between nodes generally follows an exponential decay 
ll20l . Hence, the Eq. (3) can be reformulated as 

N 

Wuv (T) = ]T (offi ~ o ni )e-^ T -°ff^ (4) 
i=i 
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We next analyze the space complexity of DAG. Obviously, 
exact tracking of w uv (T) needs <d(N) storage bits. Con- 
sidering scalability issue (in general, N 3> n), we should 
further reduce the storage overhead while keeping the same 
calculation precision. 

Let h(t) = of fi — orii if and only if t equals to offi, 
otherwise, h(t) = 0, we obtain the following lemma. 

Lemma 1: In a continuous interval [0,T], the Eq. (4) is 
equivalent to the following Eq. (5) 

w uv (T) = ]T /^)e- /3(T - t) (5) 

t<T 

Proof: Let us split the interval [0, T] into two disjoined 

JV 

parts Ti and Ti, where T% — U U (U = [oni,of f ';]) and 
T 1 UT 2 = [0,T]. We have 

£>(t) e -« r -*> = J2 M^" ?(T " l) + E &(*)e~ /,(T ~ t) 

t<T teTi iST 2 

For any t e T 2 , since t ^ offi (note that offi £ T\ and 
Ti n T 2 = 0), we have h(t) = 0. Hence, 

E h(t)e-^ T -^ = E /i(i)e~' 9(T " t) 

JV 2V 

= EE h(t)e~^ = E (o//< - onOe-^^^ 

i=i te* ; j=i 
= w uv (T) 

■ 

Theorem 1: At each time slot t = 0, 1,2, ...,T, the value 
of w uv (T) can be maintained easily using 

w uv (T) = h{T) + e~ f3 w uv {T-l) (6) 

Proof: From the Lemma 1, we have 

io««(T) = E ^(i)e"' 3(T " t) 

= / i (T)e- /3(T - T) + E M^" f(T " l) 

i<T-l 

= ft(T)+ E K^e-^- 1 -^ 

i<T-l 

= / i (T) + e- /3 E ft(i)e"' 3(T " 1 " t) 

i<T-l 

= /i(T) + e- f3 w uv {T- 1) 

■ 

From Theorem 1, each node only requires a single counter 
to exactly track the relationship between itself and any other 
node, which forms the row vector w u of matrix W. As such, 
when two nodes meet each other, they can update their own 
matrix W by swapping such row vectors. Note that to keep 
the latest row vector w u of matrix W, each node carries a list 
Recent _Time(n) as an indicator to record the last time when 
the corresponding w u (u = 1, 2, n) was updated. Based on 
this indicator, they only need to swap the latest w u to each 
other together with the Recent _Time(n). After that, they 
update their own W and Recent _Time(n), respectively. 



III. RELAYING ALGORITHM 

In this section, we first present our forwarding scheme 
OFPC in Section III. A, and then discuss how to quantify the 
partial centrality metric and detect the overlapped community 
structure in Section III.B and Section EH.C, respectively. 

A. Opportunistic Forwarding with Partial Centrality 

We here present the OFPC algorithm. OFPC combines the 
knowledge of node partial centrality and that of overlapped 
community structure to make informed forwarding decisions. 
There are two intuitions behind this algorithm. First, the same 
person plays different social roles relative to different groups. 
Hence, one component of OFPC is to forward messages to 
nodes with higher partial centrality metrics to the destination 
communities than the current relay. Second, people show 
different social behaviors in society. Some tend to form one 
clique in their social lives. Others like to join multiple cliques. 
What is more, few people prefer to stay at home. Therefor, 
the other component of OFPC is to make different forwarding 
decisions based on the various types of nodes. The two 
components together form the algorithm. For this algorithm, 
we classify nodes into three categories 1) strong nodes (nodes 
only belonging to one community), 2) bridging nodes (nodes 
belonging to multiple communities) and 3) noise nodes (nodes 
not belonging to any community). Please refer to Section III.C 
for more formal definitions. 

We next describe the baseline implementation of OFPC. 
Take node u and node v as samples. Suppose node u meets 
node v, for any message m that u carries, if its destination ma 
is node v, node u delivers it to node v and removes it from 
us message queue. Otherwise, if node v does not hold this 
message, node u makes different forwarding decisions based 
on the categories they belong to. 

(1) Node v is a noise node: Node u does not forward the 
message m to node v. 

(2) Node u is a noise node, but node v is a strong or 
bridging node: Node u forwards rn to node v and deletes m 
from its buffer. 

(3) Neither u nor v is a noise node: In this situation, if 
the message m has not been delivered to the community that 
the destination belongs to, it is forwarded to such nodes with 
higher partial centrality metrics (relative to the community 
partners of destination) than the current relay, until it reaches 
a node which shares a community with the destination node. 
Then the message is only forwarded to the community mem- 
bers with higher partial centrality metrics until the destination 
receives it or it expires. Furthermore, in order to further 
reduce cost, the original carriers can clear m from their buffer 
whenever m enters into the community!! Algorithm 1 outlines 
the above process, where <fi denotes the null set, PC U is the 
partial centrality of node u, Com(u) denotes node us set of 

2 When node v carries m as well, and node v is one of the community 
partners of the destination, node u can delete m from its buffer. Note that 
to prevent the situation where node u occasionally moves out of the same 
community, node u deletes m only w umd < w vmd and node v carries m 
(as shown in line 16, Algorithm 1) 
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community labels and (Com(u) A Com(m,d) ^ <t>) denotes 
two nodes u and rrid belong to the same community, otherwise, 
they do not share one community. 

Algorithm 1 OFPC, pseudo-code of node u 
1: upon meeting up node v do 
2: for any message to in it's queue do 
3: if (m ^ v.queue) then 
4: if (neither u nor v is a noise node) then 
5: if (Coto(u) A Com(m,d) == <fi) and (Com(v) A 

Com(m d ) == 4>) and (PC U < PC V ) then 
6: TO — > u 

7: end if 

8: if (Coto(u) A Com(nid) == <p) and (Coto(w) A 

Com(m,d) ^ 4>) then 
9: to — >• i> 

10: end if 

11: if (Com(u) A Com(m d ) ^ <j>) and (Com(v) A 

Com{m d ) ^ 0) and {PC U < PC V ) then 

12: TO — >• U 

13: end if 

14: end if 
15: end if 

16: if (to G v.queue) and (Coto(i;) A Com(md) </>) and 
(Com(u) A Com(m d ) == (f) and (uw d < uwj 
then 

17: u.Remove(m) 
18: end if 
19: end for 

We next detail the evaluation of partial centrality and the 
detection of overlapped community structure in the following 
two sections, respectively. 

B. Evaluating Partial Centrality 

As aforementioned, we mainly focus on the partial centrality 
of a node relative to the community members. Traditional so- 
lutions for computing node centrality are not applicable, due to 
the unknown number of neighbors and vulnerable end-to-end 
path in opportunistic networks. To deal with this issue, we use 
the technology of Principal Component Analysis (PCA) ifTTl 
to evaluate the partial centrality metric, and correspondingly, 
to detect the overlapped community structure. 
Principal Component Analysis: Principal component analy- 
sis is a powerful tool to extract relevant information from a 
data set by filtering noise and redundant data. This relevant in- 
formation reveals the hidden, simplified structures underlying 
the data set. We generalize the principle of PCA as follows. 

Suppose that a node u has built the matrix W from its 
view of the DAG (please refer to Section II.B), and the matrix 
W has been centralized (i.e., subtract the corresponding mean 
from each of columns). Let Cw = W T W/(n — 1) denote the 
covariance matrix of W. Let us further diagonalize the Cw as 

P T C W P = A (7) 
where A = diag(l, 2, n) and P is a normalized orthogonal 
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Fig. 2. The spectral space of W and its vector representation. 



and A; the corre- 
. We can see from 



matrix. Let xt be the eigenvectors of Cvv 
sponding eigenvalues, and Ai > A2 > ...A, 
Fig f2] that the row vector a u (a u i, a U 2, a un ) denotes the 
distribution of node u in the n-dimensional spectral space, and 
the column vector Xi(ctu, ct2i, ct n i) denotes the coordinates 
of all of the nodes in the ith dimension of the spectral space. 
In addition, once we get the orthogonal matrix P, we generally 
select the top fc-dimensional spectral space (xi, x%, Xk) 
as the principal component of W, since the corresponding 
top k eigenvalues dominate the spectral graph features [21 1. 
Algorithm 2 describes the above computation process and 
TableJI] lists the main notations used in the paper. 

TABLE I 

The main notations used in the paper 



NOTATION 


Explanation 


G 


The decayed aggregation graph 


W 


The adjacent matrix of graph G 


Wu 


The row vector of matrix W 


Cw 


The covariance matrix of W 


Pk+i 


The noise components of W 


Pk 


The principal components of W 


P 


The eigenvector decomposition of C\v 


w k 


The dimensionality reduction matrix of W 




The covariance matrix of Wk 


A 


The diagonal matrix of Cw 


A, 


The ith eigenvalue of Cw 




The ith eigenvector of C'w 


«« 


The distribution of node u in the n-dimensional 
spectral space 


l,k 

a u 


The signal distribution of a u 


k + l,n 

Oiu 


The noise distribution of a u 



Algorithm 2 PCA 



1: Input: an adjacency matrix W of DAG 

2: Output: orthogonal matrix P and diagonalized matrix A 

3: W — centralized(W) 

4: Cw = cov(W) 

5: [P, A] i — eigs(Cw,n) 

Mathematically, let matrix Pk — (xx, X2, Xk) and A^ = 
diag(l, 2, fc). Let a+ = (\a ul \ , \a u2 \ , |a ui | , \a uk \), 
where \a U i\ denotes the absolute value of a U i, we have 

Lemma 2: For a given decayed aggregation graph G with 
k communities, the matrix Pk is the projection matrix, the 
vector a+ presents the likelihood of node us attachment to 
such k communities. 
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Proof: Let W k denote the dimensionality reduction matrix 
of W and the matrix C\y k be the co variance matrix of W k . 
Based on the theory of PCA, Cw k should been diagonalized 
as well, we have 



C Wk 



wlw k 



A k 



n-1 

On the other hand, from Eq.(7), we get 

P T C W P = A => P^CwPk = A fc 



(8) 



(9) 



Replace A k with Eq.(8) and C w with W T W/(n - 1), 
respectively, we have 

wlw k T WlW k P k T W T WP k 

— = f k Own =>■ — = : 

n — 1 n — 1 n-1 

Multiply both sides by (n — 1) and use the substitution of 

pT W T = (WP k ) T , we get 



WP k = W k 



(10) 



Hence, we conclude that |a TO *| is the projection length of node 
u in community i. ■ 
Theorem 2 (Node u's partial centrality X ): Let PC l u de- 
note the partial centrality of node u with respect to a com- 
munity i, we have 

PCi = \a U i\\i (11) 

Proof: From the Lemma 2, we know that the likelihood 
of node it's attachment to the community i equals to |a U j|, 
and from the spectral graph theory ETl . it has been shown 
that the eigenvalue A^ indicates the strength of community i 
in the graph G. Hence, we get PC % U = \a v i \ Xi. ■ 

Similarly, if node u belongs to multiple communities, we 
have the following Theorem 3. 

Theorem 3 (Node u's partial centrality XX ): Let PC U de- 
note the partial centrality of node u and k u denote the number 
of communities including node u (k u < fc), we have 



I A* 



(12) 



C. Detecting the Overlapped Community Structure 

Cutting a graph into small clusters has been studied widely. 
We test the fc-means, one of the most well-known clustering 
algorithms [22 J, by extending it into an overlapped community 
structure. The advantage of the fc-means algorithm compared 
to other methods such as CNM [23| and fc-clique ll24l is that 
it does not need to know the neighbor relationship between 
nodes, and only requires the adjacent matrix of a weighted 
graph such as the DAG, while the CNM and fc-clique are 
more appropriate to a binary graph. In addition, based on 
the technology of PCA discussed above, we can confidently 
determine the number of communities, the initial elements 
for each community and the termination condition, three 
issues strongly affecting the performance of fc-means. We next 
discuss how to detect the overlapped community structure 
based on the refined fc-means. 

Determining fc, the number of communities: PCA provides a 
roadmap to reduce a confusing data set to a lower dimension 
that retains the main features of the original data set. The 



rationale behind this is that the eigenvalues of a network, 
play a big role in many important graph features. It has been 
shown that the maximum degree, clique number, and even the 
randomness of a graph are all related to Ai. In general, we 
select the top fc eigenvectors to denote the main structures of 
the graph, where the value of fc satisfies 



(13) 



and the ratio R usually belongs to the interval [0.7, 0.9] ifTTl . 
In this paper, we set R — 0.85, we believe it is enough to 
characterize the main structures of a network (please refer to 
the Section IV.A). 

Identifying the noise nodes: PCA divides a network into 
two different parts: 1) the principal components P k , and 2) 
the opposite P k +i, where the P k+1 = (x k+1 , x k+2 , x n ), 
as shown in Figf2] We call the latter noise components of the 
network. And accordingly, we divide the row vector a u by 
ai' k (a u i,a u2 , ...,a uk ) and a* +1,n (a Uj fc + i, a Utk+2 , a un ), 
the signal and noise of the node u. The following definition 
2 helps to identify whether a node is a noise node or not. 

Definition 2 (Node u's signal— to— noise ratio SNR U ): 
SNR U = E lG [i, fe ] (V>%i) 2 /X.,- e [fc + i,„] (Xja UJ f. 

From the Theorem 2, we know that the node u's partial 
centrality relative to community i is \a u i \ Xi, which is also the 
amplitude of node us signal in the ith dimensional spectral 
space. Hence, the signal energy e^ ignal of node u can be 
presented as e u slgnal = E ie [i,fe] ( A » \ a ^\f = E ie[ i, fc] (X*a ut f, 
and the noise strength e u nmse equals to E je[fc+ i, n] (^««j) 2 - 
Based on definition 2, we propose the following definition. 

Definition 3 (Noise Nodes): The node u is a noise node if 
its SNR U satisfies SNR U < 1. 

Determining the initial elements for each community: After 
we have ascertained the number of communities and excluded 
the noise nodes, the next step is to determine the initial 
centroid rrii(i — 1, 2, fc) for each community. We select the 
node u, s.t. max|a U j|(u = 1,2, ...,n) for each eigenvector 
Xi, as the initial node of community i, and set rrii — a u . 
Algorithm 3 describes this procedure. 

Algorithm 3 The Initial Centroid 

1: Input: P k , maxValue <— 0, v 4— 

2: Output: nii(i — 1, 2, fc) 

3: for i=l to fc do 

4: maxV alue = \ot\i\ 

5: v ^— 1 {Tracking who is the maximum} 

6: for u=2 to n do 

7: if \ct U i\ > maxValue A u is not a noise node then 

8: maxValue — \a U i\, v <— u 

9: end if 

10: end for 

11: Ci «- Ci U {v}, m l 4- a v 

12: Com(v) <- Com(v) U {i} 

13: end for 
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Termination condition of fc-means: Suppose all of the non- 
noise nodes have been clustered, and the is updated by 

where rii is the number of nodes only belonging to Ci (i.e., 
node u is a strong node, see Definition 4). fc-means is 
characterized by minimizing the sum of squared errors, 

k 

j = Y.Y1 K- m 2 (is) 

It has been shown that the standard iterative method to k- 
means suffers seriously from the local minima problem, be- 
cause of the greedy nature of the update strategy. Fortunately, 
the Theorem 4 guarantees the PCA-based fc-means is immune 
to this problem. 

Theorem 4 (Theorem 3.2 of j|25l/): Minimizing J is equiv- 
alent to maximizing trace(P T Cw P) (please refer to Eq. (19) 
of ll25l ). and max trace(P T CwP) = Ai + A 2 + ... + Xk- 

In other words, the PCA-based fc-means has reached the 
optimal performance once we cluster all of the non-noise 
nodes for the first time. 

Detecting the overlapped community structure: In this pa- 
per, we allow a non-noise node to join multiple communities, 
and classify them into the following two categories 1) strong 
nodes and 2) bridging nodes. 

Definition 4 (Strong Nodes): A node u is a strong node if 
it only belongs to one community. 

Definition 5 (Bridging Nodes): A node u is a bridging node 
if it joins two or multiple communities. 

We now discuss how to online identify the strong nodes and 
bridging nodes based on the following steps. 

(1) Clustering nodes: For any node u, we compute the 
distance between itself and the centroid m,, dist(a u , mi), and 
select i, s.t. mmdist(a u ,mi) (i — l,2,...fc) as the community 
node u belongs to, where, 

a m T 

dist(a u , rxii) — 6(u, i) = arccos - " ' — n — 

||a«|| 2 ||?Tli|| 2 

and 9(u,i) denotes the angle between a u and m^ We label 
node u a strong node and update by Eq. (14). For other 
(j 7^ hi — 1, 2, ...k), we label node u a bridging 
node belonging to community j, if and only if the 6(u,j) G 
[w/A — Lp, Tr/A+ip] (please refer to Theorem 5), where ip is the 
overlapped coefficient. Algorithm 4 describes the clustering 
procedure. 

(2) Adjusting the categories of nodes: After step (1) 
finishes, the explicit community structure has been detected 
together with the blurred labels of nodes. Because some of 
nodes with "strong" labels may share multiple communities, 
those with "bridging" only belong to one community, or even 
such are labelled with strong and bridging simultaneously. To 
this end, we need to re-classify each node. Let \Com(u)\ de- 
note the number of communities node u belongs to. Algorithm 
5 presents the adjusting process. 

Determining the overlapped interval [ir/4 — cp,n/4: + (p\: 
This section focuses on why we set the overlapped interval 

[tt/4 - (p, 7r/4 + <p]. 



Algorithm 4 Clustering nodes 



1 


for u=l to n do 


2 


for i=l to k do 


3 


Computing dist(a u ,mi) 


4 


end for 


5 


Selecting i, s.t. min8(u,i) (i — 1, 2, ...k) 


6 


Ci <- Ci U {u} 


7 


Comyu) Comyu) U \i\ 


8 


Updating mi 


9 


II Identifying bridging nodes 


10 


for Other 9(u,j) (j = 1, 2, ...k) do 


11 


if 9(u, j) e [tt/4 - <^,7r/4 + ip] then 


12 


Cj <r- C-j U {u} 


13 


Com(u) <— Com(u) U {j} 


14 


end if 


15 


end for 


16 


end for 



Algorithm 5 Adjusting node's categories 

1: for u=l to n do 

2: if \Com(u)\ > 2 A u is a strong node then 

3: u is a bridging node 

4: end if 

5: if \Com(u)\ == 1 A u is a bridging node then 

6: u is a strong node 

7: end if 

8: end for 



Lemma 3: Strong nodes from fc communities form fc quasi- 
orthogonal lines in the spectral space. 

Proof: From Definition 4 and the clustering pro- 
cess mentioned above, we know that the centroid 
(mii, m.2i, m n i) can approximately present the line formed 
by strong nodes within the zth community (please refer to 
Eq. (14)). On the other hand, the virtual centroid vector m., 
should be close to eigenvector Xi. This is mainly because 
TOj ?» m., = (J2ucc a «v / n i> as a ™ ' s me dominant part of 
a u . Hence, mi locates in the line formed by the eigenvector 
Xi. We get the conclusion as different eigenvectors are linearly 
independent. ■ 




Fig. 3. Overlapped coefficient and confidence interval. 

Theorem 5 (Identifying Bridging Nodes): A node u can 
join two communities i and j if 9(u,i),9(u,j) e [7r/4 — 
tp, tt/4 + </?]. 
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Proof: From Lemma 3, bridging nodes are not signifi- 
cantly close to any of the lines. We believe that a "theoretic" 
bridging node should be located along the diagonal of a 
2-dimensional space as shown in FigfJ] That is, the angle 
between a u and C, should equal to ir/4. We set the confidence 
interval [tt/4 — 99, tt/4 + ip] in this paper when considering the 
practical situations, and develop an overlapped coefficient ip 
to adaptively adjust the radian of interval. ■ 

IV. Data-sets and Experimental Results 

We first analyze the overlapped community structures under- 
lying the data-sets we used, and then compare the performance 
of OFPC with two state-of-the-art works: Bubble and Prophet 
|4| together with the Epidemic BP and Direct Contact Q al- 
gorithms as benchmarks. Bubble is a well-known social-based 
forwarding algorithm and Prophet is cuiTently an IETF draft 
1 30 1 . Results of Epidemic and Direct algorithms provide us the 
upper and lower bounds of important performance evaluation 
metrics: mean delivery delay, cost and packet delivery ratio. 

A. Data-sets 

We use the following three real data-sets gathered by 
Il26l 1271 . referred to as North Carolina State Fair, NCSU, 
and KAIST. The characteristics of these data-sets such as 
intra/inter-contact distribution have been explored in several 
studies (e.g., [26] |27|) and applied into different scenarios 
1 28 1 1 29]. Interestingly, by analyzing these traces, we find 
that they cover a rich diversity of environments ranging from 
well connected scenario (Statefair) to quite sparse situation 
(NCSU). The general statistics of the three data-sets are 
summarized in Table [II] 

TABLE II 

Statistics of collected real traces from three sites 



Site 


No. of trajectories 


volunteers 


start date / end date 


Statefair 


19 


18 


2006-10-24 / 2007-10-21 


NCSU 


35 


20 


2006-08-26 / 2006-11-16 


KAIST 


92 


34 


2006-09-26 / 2007-10-03 



TABLE III 

STATISTICS OF COMMUNITY STRUCTURE UNDERLYING THE THREE 
SCENARIOS OS = 1, (p = 0.027 Si 5°) 



Site 


maximum 


minimum 


mean 


variance 


Statefair 


7 


3 


5.14 


0.8661 


NCSU 


11 


4 


7.97 


3.5931 


KAIST 


9 


1 


5.25 


6.1905 



Overlapped community structures underlying the data- 
sets: Fig0] illustrates the number of communities hidden be- 
hind the three scenarios at different moments. We observe that 
Statefair exhibits a more stable topology, compared to KAIST 
and NCSU. The variance of the number of communities is 
0.8661 at Statefair, while those at KAIST and NCSU are 
6.1905 and 3.5931, respectively, as shown in TabiTTTl where the 
maximum, minimum and mean of the number of communities 
are presented as well. In addition, TabllVI presents the average 
ratio of the noise, bridging and strong nodes of the three 



scenarios. We observe that NCSU shows a poorest connectivity 
and there exists 7 ([21.23% x 35J) noise nodes. At the same 
time, we notice that there indeed exist overlapped commu- 
nity structures underlying these data-sets, since there exist 6 
(L7.45% x 92J) bridging nodes at KAIST, 3 ([11.02% x 35j) 
bridging nodes at NCSU, and 2 ([11.39% x 19J) bridging 
nodes at Statefair, respectively. 




2000 4000 6000 8000 10000 12000 

Time (second) 

Fig. 4. The number of communities at different moments 
TABLE IV 

Average ratio of the number of noise, bridging and strong 
nodes (/3 = l,<p = 0.027 m 5° ) 



Site 


noise nodes (%) 


bridging nodes (%) 


strong nodes (%) 


Statefair 


5.36 


11.39 


83.25 


NCSU 


21.23 


11.02 


67.75 


KAIST 


9.69 


7.45 


82.86 



B. Simulation Setup 

We utilize the aforementioned three real data-sets to test 
the premise of forwarding scheme based on social structures. 
For each data-set, one randomly chosen source node sends a 
message to one randomly chosen destination node, and total 
1000 messages are generated. The nodal transmission range 
is set to 250m, a typical value of WiFi, and the emulation 
results are the average over 50 runs for statistical confidence. 
In addition, we compare OFPC against the optimized versions 
of Bubble and Prophet. We use an offline method to compute 
the betweenness centrality of each node for Bubble (i.e., we 
first flood a large number of messages in the network, and 
count the number of times a node acts as a relay for other 
nodes on all the shortest paths lfT3l0 . and take the default 
experimental parameters for Prophet (4). 

C. Performance Evaluation 

Mean delivery delay (MDD): Figjs] illustrates the perfor- 
mance of mean delivery delay within different message TTLs. 
It's obvious to see that OFPC expedites the dissemination 
speed of message. For example, it achieves up to a 70% 
improvement in MDD over Prophet and 40% over Bubble 
at Statefair (Fig|5](a)). The reason behind this is that OFPC 
exploits the partial centrality metric to make forwarding de- 
cisions, this novel metric provides us a fine-grained level 



s 



of characterizing the relations between nodes, thus, it helps 
to choose the more qualified relays than the centrality-based 
scheme does. 

Cost: Fig|6] clarifies that OFPC has the best performance 
in term of cost as well. For example, at Statefair, OFPC 
helps considerably in reducing up to 2x and 3x overhead 
in Bubble and Prophet. Even at NCSU (Fig|6](b)), the very 
sparse scenario, OFPC still outperforms Bubble and Prophet. 
This is mainly because 1) the partial centrality metric helps to 
improve the delivery delay, and in turn to reduce the cost, and 
2) we exclude the noise nodes from the relay candidates, as 
they are isolated and far away from the community members 
(please refer to Section III. A). 

Packet delivery ratio (PDR): FigOpresents the performance 
of packet delivery ratio. We can see that, in general, OFPC 
achieves a similar delivery ratio to Bubble, and both of them 
outperform the Prophet. 

V. RELATED WORK 

In the past, a lot of opportunistic forwarding algorithms 
have been proposed, we classify them into the following two 
categories based on the contexts they used. 
Physical contact based: A. Vahdat and D. Becker first 
proposed an epidemic forwarding style for partially connected 
ad hoc networks ||3T1 . They tried to grasp each forwarding op- 
portunity thus guaranteeing a high packet delivery ratio while 
consuming more system resources as well. This deficiency has 
motivated researchers to develop other forwarding mechanisms 
(e.g, ID (7) 10). For most of them, the networking perfor- 
mance depends heavily on the contexts they utilized to identify 
"the best" relay node to destination. For example, A. Lindgren 
et al. [4 1 presented Prophet, a probabilistic routing protocol for 
opportunistic networks. They exploited past contact moments 
to predict the probability of future encounters. Similarly, J. 
Leguay et al. [7] proposed MobySpace, a high-dimensional 
Euclidean space constructed by the past contact locations. 
Apparently, this scheme reduces the overhead but increases 
the delivery delay. 

Social contact based: Noting that the aforementioned physical 
contact based scheme does not consider the social structures 
evolving from human activities. Whereas, with the recent 
popularization of personal hand-held mobile devices, human 
mobility gradually plays a critical role in networking per- 
formance, as human walks show a strong spatiotemporal 
correlation (e.g., clustering) J9] ifTUll . instead of purely ran- 
dom motions. Considering this fact, researchers have recently 
focused on the influence of social structure on opportunistic 
communication. For instance, E. Daly, P. Hui and A. Mtibaa 
et al. lfl2ll lfl3l lfl4l further exploited social structures such 
as centrality/similarity metric to make forwarding decisions. 
Messages will be forwarded to such nodes with relatively 
high centrality/similarity metrics to increase the probability of 
finding better relays to the final destination. For example, in 
SimBet lfl2l . each node evaluates its centrality and similarity 
metrics based on the ego network technology, and a message it 
carries is either forwarded to nodes having higher similarities 



with the destination node, or stays with the most central node. 
Similarly, in Bubble fl3l . a message is relayed across nodes 
with increasing centrality metrics, until it enters into the range 
of the destination community. In addition, A. Mtibaa et al. 
proposed PeopleRank fl4l . which exploited PageRank lfl6l 
algorithm to evaluate node centrality, and a message is only 
forwarded to such nodes with higher centralities than the 
current carriers. 

Obviously, the main difference between our work and the 
state-of-the-art works is that we explore the impact of partial 
centrality metric on performance of opportunistic routing. 
We believe that this novel metric helps to make informed 
forwarding decisions. 

VI. Conclusion and Future Work 

In this paper, we propose OFPC, a partial centrality met- 
ric based forwarding algorithm, to improve the performance 
of opportunistic routing. We first formulate the strength of 
relationship between nodes as a Decayed Sum Problem, and 
use a Decayed Aggregation Graph to model the opportunis- 
tic network. We then present an online method to evaluate 
node partial centrality by exploiting the theory of principal 
component analysis. Third, we detect the overlapped commu- 
nity structure combining the technology from PCA and k- 
means algorithm. We finally validate the effectiveness of our 
method by trace-driven simulation. One significant topic for 
future work is to extend the partial centrality metric to other 
important applications such as recommendation system and 
worm containment in online social networks. 
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